Real-Time Modulation Enhancement of Temporal Envelopes for Increasing Speech Intelligibility
نویسندگان
چکیده
In this paper, a novel approach is introduced for performing real-time speech modulation enhancement to increase speech intelligibility in noise. The proposed modulation enhancement technique operates independently in the frequency and time domains. In the frequency domain, a compression function is used to perform energy reallocation within a frame. This compression function contains novel scaling operations to ensure speech quality. In the time domain, a mathematical equation is introduced to reallocate energy from the louder to the quieter parts of the speech. This proposed mathematical equation ensures that the long-term energy of the speech is preserved independently of the amount of compression, hence gaining full control of the time-energy reallocation in real-time. Evaluations on intelligibility and quality show that the suggested approach increases the intelligibility of speech while maintaining the overall energy and quality of the speech signal.
منابع مشابه
Modulation Enhancement of Temporal Envelopes for Increasing Speech Intelligibility in Noise
In this paper, speech intelligibility is enhanced by manipulating the modulation spectrum of the signal. First, the signal is decomposed into Amplitude Modulation (AM) and Frequency Modulation (FM) components using a high resolution adaptive quasi-harmonic model of speech. Then, the AM part of midrange frequencies of speech spectrum is modified by applying a transforming function which follows ...
متن کاملExtracting amplitude modulations from speech in the time domain
Natural sounds can be characterised by patterns of changes in loudness (amplitude modulations), and human speech perception studies have focused on the low frequencies contained in the gross temporal structure of speech. Low-pass filtering the temporal envelopes of sub-band filtered speech maintains intelligibility, but it remains unclear how the human auditory system could perform such a modul...
متن کاملOn the Role of Theta-Driven Syllabic Parsing in Decoding Speech: Intelligibility of Speech with a Manipulated Modulation Spectrum
Recent hypotheses on the potential role of neuronal oscillations in speech perception propose that speech is processed on multi-scale temporal analysis windows formed by a cascade of neuronal oscillators locked to the input pseudo-rhythm. In particular, Ghitza (2011) proposed that the oscillators are in the theta, beta, and gamma frequency bands with the theta oscillator the master, tracking th...
متن کاملMonaural Speech Enhancement using Deep Neural Networks by Maximizing a Short-Time Objective Intelligibility Measure
In this paper we propose a Deep Neural Network (DNN) based Speech Enhancement (SE) system that is designed to maximize an approximation of the Short-Time Objective Intelligibility (STOI) measure. We formalize an approximate-STOI cost function and derive analytical expressions for the gradients required for DNN training and show that these gradients have desirable properties when used together w...
متن کاملIntelligibility enhancement of casual speech for reverberant environments inspired by clear speech properties
Clear speech has been shown to have an intelligibility advantage over casual speech in noisy and reverberant environments. This work validates spectral and time domain modifications to increase the intelligibility of casual speech in reverberant environments by compensating particular differences between the two speaking styles. To compensate spectral differences, a frequency-domain filtering a...
متن کامل